209 research outputs found

    Towards Benchmarking Multi-Model Databases

    Get PDF
    Multi-model databaseNon peer reviewe

    Jiaheng Joplin Lu, Viola

    Get PDF
    Cello Suite No. 4 in E-flat Major, BWV 1010 / J.S. Bach; Louange à l\u27Éternité de Jésus / Olivier Messiaen; Theme and Variations for viola & piano / Alan Schulma

    Jiaheng Joplin Lu, Viola

    Get PDF
    Cello Suite No. 3, Prelude / J.S. Bach; Concerto for viola and orchestra / Béla Bartók; Audition Excerpts; Elegiac Trio / Arnold Ba

    Jiaheng Joplin Lu, Viola

    Get PDF
    Sonata Op. 120 No. 2 / Johannes Brahms; Le tombeau de Ravel / Arthur Benjami

    Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

    Get PDF
    Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. Building an accurate performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is critical to implement autonomic self-management big data systems. An accurate performance model is challenging because the allocation of pre-emptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation.Peer reviewe

    Multi-model Data Management : What's New and What's Next?

    Get PDF
    TutorialAs more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Nothing shows the picture more starkly than the Gartner Magic quadrant for operational database management systems, which assumes that, by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop efficient consolidated single data management platform covering both relational data and NoSQL to reduce integration issues, simplify operations, and eliminate migration issues. In this tutorial, we review the previous work on multi-model data management and provide the insights on the research challenges and directions for future work. The slides and more materials of this tutorial can be found at http://udbms.cs.helsinki.fi/?tutorials/edbt2017.Peer reviewe

    MORTAL: A Tool of Automatically Designing Relational Storage Schemas for Multi-model Data through Reinforcement Learning

    Get PDF
    Considering relational databases having powerful capabilities in handling security, user authentication, query optimization, etc., several commercial and academic frameworks reuse relational databases to store and query semi-structured data (e.g., XML, JSON) or graph data (e.g., RDF, property graph). However, these works concentrate on managing one of the above data models with RDBMSs. That is, it does not exploit the underlying tools to automatically generate the relational schema for storing multi-model data. In this demonstration, we present a novel reinforcement learning-based tool called MORTAL. Specifically, given multi-model data containing different data models and a set of queries, it could automatically design a relational schema to store these data while having a great query performance. To demonstrate it clearly, we are centered around the following modules: generating initial state based on loaded multi-model data, influencing learning process by setting parameters, controlling generated relational schema through providing semantic constraints, improving the query performance of relational schema by specifying queries, and a highly interactive interface for showing query performance and storage consumption when users adjust the generated relational schema.Peer reviewe

    Worst Case Optimal Joins on Relational and XML data

    Get PDF
    In recent data management ecosystem, one of the greatest challenges is the data variety. Data varies in multiple formats such as relational and (semi-)structured data. Traditional database handles a single type of data format and thus its ability to deal with different types of data formats is limited. To overcome such limitation, we propose a multi-model processing framework for relational and semi-structured data (i.e. XML), and design a worst-case optimal join algorithm. The salient feature of our algorithm is that it can guarantee that the intermediate results are no larger than the worst-case join results. Preliminary results show that our multi-model algorithm significantly outperforms the baseline join methods in terms of running time and intermediate result size.Peer reviewe
    corecore